library(tidyverse) # for data wrangling etc
library(cmdstanr) # for cmdstan
library(brms) # for fitting models in STAN
library(standist) # for exploring distributions
library(coda) # for diagnostics
library(bayesplot) # for diagnostics
library(ggmcmc) # for MCMC diagnostics
library(DHARMa) # for residual diagnostics
library(rstan) # for interfacing with STAN
library(emmeans) # for marginal means etc
library(broom) # for tidying outputs
library(tidybayes) # for more tidying outputs
library(HDInterval) # for HPD intervals
library(ggeffects) # for partial plots
library(broom.mixed) # for summarising models
library(posterior) # for posterior draws
library(ggeffects) # for partial effects plots
library(patchwork) # for multi-panel figures
library(bayestestR) # for ROPE
library(see) # for some plots
library(easystats) # framework for stats, modelling and visualisation
library(mgcv)
library(gratia)
theme_set(theme_grey()) # put the default ggplot theme back
source("helperFunctions.R")Bayesian GAM Part3
1 Preparations
Load the necessary libraries
2 Scenario
The Australian Institute of Marine Science (AIMS) have a long-term inshore marine water quality monitoring program in which water samples are collected and analysed from sites (reef.alias) across the GBR numerous times per year. The focus of this program is to report long-term condition and change in water quality parameters.
Although we do have latitude and longitudes, the nature of the spatial design predominantly reflects a series of transects that start near the mouth of a major river and extend northwards, yet mainly within the open coastal zone. As a result, this design is not well suited to any specific spatial analyses (since they are mainly one dimensional).
| LATITUDE | LONGITUDE | reef.alias | Water_Samples | Region | Subregion | Season | waterYear | DOC |
|---|---|---|---|---|---|---|---|---|
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Dry | 2008 | 0.830 |
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Wet | 2008 | 0.100 |
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Dry | 2009 | 0.282 |
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Wet | 2009 | 1.27 |
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Dry | 2009 | 0.793 |
| -16.1 | 145. | Cape Trib… | AIMS | Wet T… | Barron D… | Dry | 2010 | 0.380 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| LATITUDE | - Latitudinal coordinate |
| LONGITUDE | - Longitudinal coordinate |
| reef.alias | - Internal AIMS reef name |
| Water_Samples | - Categorical label of who collected the data |
| Region | - The MMP region |
| Subregion | - The MMP subregion |
| Season | - A categorical listing of Wet or Dry |
| waterYear | - The water year (1st Oct - 30 Sept) to which the data are attached |
| Date | - The date the sample was collected |
| Mnth | - The month the sample was collected |
| DOC | - Nitrite and Nitrate |
3 Read in the data
Rows: 1560 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): reef.alias, Water_Samples, Region, Subregion, Season
dbl (5): LATITUDE, LONGITUDE, waterYear, Mnth, DOC
date (1): Date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We will start by defining each of the categorical variables as factors. We will also define the random effect (reef.alias) as a factor.
4 Exploratory data analysis
Model formula: \[ y_i \sim{} \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i =\beta_0 + f(Date_i) + f(Month_i) \]
where \(\beta_0\) is the y-intercept. \(f(Date)\) and \(f(Month)\) indicate the additive smoothing functions of the long-term temporal trends and the annual seasonal trends respectively.
If we begin with a quck scatterplot of DOC agains Date..
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).
We know that these DOC values have been measured over time from a number of locations (reef.alias). Therefore, it would be good to facet the scatterplot conditional on the reef.alias.
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).
Conclusions:
- evidently, some locations (reef.alias) have longer time series than others
- the amount and variability of DOC also varies greatly between locations.
- the temporal trends are clearly not linear and possibly more complex than simple polynomials - it is likely that splines will be useful
- assumptions of normality are difficult to assess, however it would seem logical that the data will not follow a Gaussian distribution (since values less than 0 are not possible and it does appear that there is a relationship between mean and variance).
ggplot(wq, aes(y = DOC, x = Date)) +
geom_point() +
geom_smooth() +
scale_y_log10() +
scale_y_continuous(trans = scales::pseudo_log_trans()) +
facet_wrap(~reef.alias, scales = "free_y")Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 8 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 8 rows containing missing values or values outside the scale range
(`geom_point()`).
Conclusions:
- although the loess smoothers are not particularly appropriate for non-Gaussian data, they nevertheless help us see that there are both similarities and differnces in the trends between locations. Most of the locations that have a long time series show some sort of peak in DOC around about 2014
5 Data preparations
Date data cannot be directly modelled, it must be converted into a numeric. This can be performed with the decimal_date() function within the lubridate package.
Although we generally want to scale any continuous predictors, it appears that doing so with Date objects has downstream issues - the models fit ok, but partial plots are displaced along the x-axis. So as an alternative either dont scale, or do so with a routine that does not automatically back-trasform (such as sjmisc::std or sjmisc::center).
6 Simple model (High West only)
wq.sub <- wq |>
filter(reef.alias == "High West") |>
droplevels()
ggplot(wq.sub, aes(y = DOC, x = Date)) +
geom_point() +
geom_smooth() +
scale_y_log10()`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(wq.sub, aes(y = DOC, x = Date)) +
geom_point() +
geom_smooth(method = "gam", formula = y ~ s(x), method.args = list(family = "tw")) +
scale_y_log10()Conclusions:
- of concern in the above figure is the set of DOC values of 0.01 at the start of the time series. These values represent the limit of detection of the instrumentation used for measuring DOC at the time. Any value measured at 0.01 or less was just recorded as 0.01.
- ideally, such cases should be modelled as censored so as to acknowledge that they are at or below 0.01 and not exactly 0.01 (with no variation). Unfortunately, this is not available with the
mgcvpackage (or any frequentist package for fitting GAM’s that I am aware of). - options for dealing with such circumstances very much depend on the reliability of these values. If we believe that they are approximately representative, we might elect to keep them in an model as normal. On the other hand, if the reliability of the measurements is questionable, we might elect to exclude these values.
- in the absence of any additional information about these observations, we will keep them in.
prior class coef group resp dpar nlpar lb ub
(flat) b
(flat) b sDt.num_1
student_t(3, 6.8, 2.5) Intercept
student_t(3, 0, 2.5) sds 0
student_t(3, 0, 2.5) sds s(Dt.num) 0
gamma(0.01, 0.01) shape 0
source
default
(vectorized)
default
default
(vectorized)
default
wq.brm <- brm(wq.form,
data = wq.sub,
prior = prior(normal(0, 2.5), class = "b"),
sample_prior = "only",
iter = 5000,
warmup = 1000,
chains = 3,
thin = 5,
backend = "rstan",
refresh = 0
)Compiling Stan program...
Start sampling
Warning: There were 1512 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
conditional_effects
The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]
Sample prior only
I will also overlay the raw data for comparison.
wq.form <- bf(DOC ~ s(scale(Dt.num)), family = Gamma(link = "log"))
wq.sub |> summarise(
median(log(DOC)), mad(log(DOC)),
mad(log(DOC)) / mad(scale(Dt.num))
)# A tibble: 1 × 3
`median(log(DOC))` `mad(log(DOC))` `mad(log(DOC))/mad(scale(Dt.num))`
<dbl> <dbl> <dbl>
1 6.85 0.126 0.106
prior class coef group resp dpar nlpar lb ub
(flat) b
(flat) b sscaleDt.num_1
student_t(3, 6.8, 2.5) Intercept
student_t(3, 0, 2.5) sds 0
student_t(3, 0, 2.5) sds s(scale(Dt.num)) 0
gamma(0.01, 0.01) shape 0
source
default
(vectorized)
default
default
(vectorized)
default
priors <- prior(normal(7, 0.2), class = "Intercept") +
prior(normal(0, 2), class = "b") +
prior(gamma(0.01, 0.01), class = "shape") +
prior(student_t(3, 0, 1), class = "sds")
wq.brm2 <- brm(wq.form,
data = wq.sub,
prior = priors,
sample_prior = "only",
iter = 5000,
warmup = 1000,
thin = 5,
chains = 3, cores = 3,
backend = "rstan",
seed = 123,
control = list(adapt_delta = 0.99, max_treedepth = 20),
refresh = 0
)Compiling Stan program...
Start sampling
Warning: There were 3 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
Sample prior and posterior
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
Sample prior only
I will also overlay the raw data for comparison.
wq.form <- bf(DOC ~ gp(Dt.num), family = Gamma(link = "log"))
wq.sub |> summarise(
median(log(DOC)), mad(log(DOC)),
mad(log(DOC)) / mad(scale(Dt.num))
)# A tibble: 1 × 3
`median(log(DOC))` `mad(log(DOC))` `mad(log(DOC))/mad(scale(Dt.num))`
<dbl> <dbl> <dbl>
1 6.85 0.126 0.106
prior class coef group resp dpar nlpar lb ub
student_t(3, 6.8, 2.5) Intercept
(flat) lscale 0
inv_gamma(1.494197, 0.056607) lscale gpDt.num 0
student_t(3, 0, 2.5) sdgp 0
student_t(3, 0, 2.5) sdgp gpDt.num 0
gamma(0.01, 0.01) shape 0
source
default
default
default
default
(vectorized)
default
priors <- prior(normal(7, 0.2), class = "Intercept") +
prior(gamma(0.01, 0.01), class = "shape") +
prior(inv_gamma(1.5, 0.05), class = "lscale") +
prior(student_t(3, 0, 1), class = "sdgp")
wq.brm2b <- brm(wq.form,
data = wq.sub,
prior = priors,
sample_prior = "only",
iter = 5000,
warmup = 1000,
thin = 5,
chains = 3, cores = 3,
backend = "rstan",
seed = 123,
control = list(adapt_delta = 0.99, max_treedepth = 20),
refresh = 0
)Warning: The global prior 'inv_gamma(1.5, 0.05)' of class 'lscale' will not be
used in the model as all related coefficients have individual priors already.
If you did not set those priors yourself, then maybe brms has assigned default
priors. See ?set_prior and ?default_prior for more details.
Compiling Stan program...
Start sampling
Warning: There were 2 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
Sample prior and posterior
Warning: The global prior 'inv_gamma(1.5, 0.05)' of class 'lscale' will not be
used in the model as all related coefficients have individual priors already.
If you did not set those priors yourself, then maybe brms has assigned default
priors. See ?set_prior and ?default_prior for more details.
Warning in gsub("\\[[[:digit:]]+\\]", paste0("__", ind), par): argument
'replacement' has length > 1 and only the first element will be used
The desired updates require recompiling the model
Warning: The global prior 'inv_gamma(1.5, 0.05)' of class 'lscale' will not be
used in the model as all related coefficients have individual priors already.
If you did not set those priors yourself, then maybe brms has assigned default
priors. See ?set_prior and ?default_prior for more details.
Warning: argument 'replacement' has length > 1 and only the first element will
be used
Compiling Stan program...
Start sampling
[1] "b_Intercept" "sdgp_gpDt.num"
[3] "lscale_gpDt.num" "shape"
[5] "Intercept" "zgp_gpDt.num[1]"
[7] "zgp_gpDt.num[2]" "zgp_gpDt.num[3]"
[9] "zgp_gpDt.num[4]" "zgp_gpDt.num[5]"
[11] "zgp_gpDt.num[6]" "zgp_gpDt.num[7]"
[13] "zgp_gpDt.num[8]" "zgp_gpDt.num[9]"
[15] "zgp_gpDt.num[10]" "zgp_gpDt.num[11]"
[17] "zgp_gpDt.num[12]" "zgp_gpDt.num[13]"
[19] "zgp_gpDt.num[14]" "zgp_gpDt.num[15]"
[21] "zgp_gpDt.num[16]" "zgp_gpDt.num[17]"
[23] "zgp_gpDt.num[18]" "zgp_gpDt.num[19]"
[25] "zgp_gpDt.num[20]" "zgp_gpDt.num[21]"
[27] "zgp_gpDt.num[22]" "zgp_gpDt.num[23]"
[29] "zgp_gpDt.num[24]" "zgp_gpDt.num[25]"
[31] "zgp_gpDt.num[26]" "zgp_gpDt.num[27]"
[33] "zgp_gpDt.num[28]" "zgp_gpDt.num[29]"
[35] "zgp_gpDt.num[30]" "zgp_gpDt.num[31]"
[37] "zgp_gpDt.num[32]" "zgp_gpDt.num[33]"
[39] "zgp_gpDt.num[34]" "zgp_gpDt.num[35]"
[41] "zgp_gpDt.num[36]" "zgp_gpDt.num[37]"
[43] "zgp_gpDt.num[38]" "zgp_gpDt.num[39]"
[45] "zgp_gpDt.num[40]" "zgp_gpDt.num[41]"
[47] "zgp_gpDt.num[42]" "zgp_gpDt.num[43]"
[49] "zgp_gpDt.num[44]" "zgp_gpDt.num[45]"
[51] "zgp_gpDt.num[46]" "zgp_gpDt.num[47]"
[53] "zgp_gpDt.num[48]" "zgp_gpDt.num[49]"
[55] "zgp_gpDt.num[50]" "zgp_gpDt.num[51]"
[57] "zgp_gpDt.num[52]" "zgp_gpDt.num[53]"
[59] "zgp_gpDt.num[54]" "zgp_gpDt.num[55]"
[61] "zgp_gpDt.num[56]" "zgp_gpDt.num[57]"
[63] "zgp_gpDt.num[58]" "zgp_gpDt.num[59]"
[65] "zgp_gpDt.num[60]" "zgp_gpDt.num[61]"
[67] "zgp_gpDt.num[62]" "zgp_gpDt.num[63]"
[69] "prior_Intercept" "prior_sdgp"
[71] "prior_lscale__1_gpDt.num" "prior_shape"
[73] "lprior" "lp__"
[75] "accept_stat__" "stepsize__"
[77] "treedepth__" "n_leapfrog__"
[79] "divergent__" "energy__"
7 MCMC sampling diagnostics
8 Model validation
9 Partial effects plots
10 Model investigation
Family: gamma
Links: mu = log; shape = identity
Formula: DOC ~ s(scale(Dt.num))
Data: wq.sub (Number of observations: 63)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Smoothing Spline Hyperparameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(sscaleDt.num_1) 1.16 0.70 0.22 2.82 1.00 1165 1987
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 6.85 0.02 6.82 6.88 1.00 2137 2369
sscaleDt.num_1 0.43 1.00 -1.73 2.33 1.00 2093 2147
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 71.63 15.83 45.78 105.72 1.00 1703 2113
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
sds(sx_1) is the sd of the smooth weights (spline coefficients). This determines the amount of ‘wiggliness’, in an analogous way to how the sd of group-level effects in a varying slopes and intercepts model determine the amount of variability among groups in slopes and intercepts. However, the actual numeric value of the sds() is not very practically interpretable, because thinking about the variance of smooth weights for any given data and model seems abstract to me. However, if the value is around zero, then this is like ‘complete-pooling’ of the basis functions, which means that there isn’t much added value of more than a single basis function.
sx_1 is the unpenalized weight (ie coefficient) for one of the “natural” parameterized basis functions. The rest of the basis functions are like varying effects. Again, because the actual numeric value of sxs_1 is the value for the unpenalized coefficient for one of the basis functions, this wouldn’t seem to have a lot of practically interpretable meaning just from viewing this number.
wq.brm3 |>
as.data.frame() |>
dplyr::select(matches("^b_.*|^bs.*|^sds.*|^sigma$|^s_s.*")) |>
summarise_draws(
median,
HDInterval::hdi
)# A tibble: 11 × 4
variable median lower upper
<chr> <dbl> <dbl> <dbl>
1 b_Intercept 6.85 6.82 6.88
2 bs_sscaleDt.num_1 0.468 -1.68 2.37
3 sds_sscaleDt.num_1 1.04 0.128 2.45
4 s_sscaleDt.num_1[1] -0.645 -3.97 1.04
5 s_sscaleDt.num_1[2] 0.0982 -0.850 1.08
6 s_sscaleDt.num_1[3] 0.165 -0.313 0.671
7 s_sscaleDt.num_1[4] 0.394 -0.193 0.936
8 s_sscaleDt.num_1[5] -0.0881 -1.24 0.894
9 s_sscaleDt.num_1[6] -1.23 -2.96 0.160
10 s_sscaleDt.num_1[7] -1.59 -5.42 0.496
11 s_sscaleDt.num_1[8] -0.394 -2.52 0.989
11 Explore more models
wq.form <- bf(DOC ~ s(Dt.num, by = Season), family = Gamma(link = "log"))
priors <- prior(normal(7, 0.2), class = "Intercept") +
prior(normal(0, 2), class = "b") +
prior(gamma(0.01, 0.01), class = "shape") +
prior(student_t(3, 0, 10), class = "sds")
wq.brm4 <- brm(wq.form,
data = wq.sub,
prior = priors,
sample_prior = "only",
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
seed = 123,
control = list(adapt_delta = 0.99, max_treedepth = 20),
refresh = 0
)Compiling Stan program...
Start sampling
Warning: There were 5 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
wq.brm4 |>
conditional_effects(effects = "Dt.num:Season") |>
plot(points = TRUE) |>
_[[1]] +
scale_y_log10()Sample prior and posterior
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
Warning: There were 5 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
[1] "b_Intercept" "bs_sDt.num:SeasonDry_1"
[3] "bs_sDt.num:SeasonWet_1" "sds_sDt.numSeasonDry_1"
[5] "sds_sDt.numSeasonWet_1" "shape"
[7] "Intercept" "s_sDt.numSeasonDry_1[1]"
[9] "s_sDt.numSeasonDry_1[2]" "s_sDt.numSeasonDry_1[3]"
[11] "s_sDt.numSeasonDry_1[4]" "s_sDt.numSeasonDry_1[5]"
[13] "s_sDt.numSeasonDry_1[6]" "s_sDt.numSeasonDry_1[7]"
[15] "s_sDt.numSeasonDry_1[8]" "s_sDt.numSeasonWet_1[1]"
[17] "s_sDt.numSeasonWet_1[2]" "s_sDt.numSeasonWet_1[3]"
[19] "s_sDt.numSeasonWet_1[4]" "s_sDt.numSeasonWet_1[5]"
[21] "s_sDt.numSeasonWet_1[6]" "s_sDt.numSeasonWet_1[7]"
[23] "s_sDt.numSeasonWet_1[8]" "prior_Intercept"
[25] "prior_bs" "prior_sds_sDt.numSeasonDry"
[27] "prior_sds_sDt.numSeasonWet" "prior_shape"
[29] "lprior" "lp__"
[31] "accept_stat__" "stepsize__"
[33] "treedepth__" "n_leapfrog__"
[35] "divergent__" "energy__"
Error in model.matrix.default(f, dat): model frame and formula mismatch in model.matrix()
wq.resids <- make_brms_dharma_res(wq.brm5, integerResponse = FALSE)
wrap_elements(~ testUniformity(wq.resids)) +
wrap_elements(~ plotResiduals(wq.resids, form = factor(rep(1, nrow(wq.sub))))) +
wrap_elements(~ plotResiduals(wq.resids, quantreg = TRUE)) +
wrap_elements(~ testDispersion(wq.resids))
Durbin-Watson test
data: simulationOutput$scaledResiduals ~ 1
DW = 1.3984, p-value = 0.01408
alternative hypothesis: true autocorrelation is not 0
resids1 <- recalculateResiduals(wq.resids, group = wq.sub$Dt.num, aggregateBy = mean)
testTemporalAutocorrelation(resids1, time = wq.sub$Dt.num)
Durbin-Watson test
data: simulationOutput$scaledResiduals ~ 1
DW = 2.0971, p-value = 0.6987
alternative hypothesis: true autocorrelation is not 0
wq.form <- bf(DOC ~ s(Dt.num) + s(Mnth, bs = "cc", k = 6),
family = Gamma(link = "log")
)
priors <- prior(normal(7, 0.15), class = "Intercept") +
prior(normal(0, 2), class = "b") +
prior(gamma(0.01, 0.01), class = "shape") +
prior(student_t(3, 0, 10), class = "sds")
wq.brm6 <- brm(wq.form,
data = wq.sub,
prior = priors,
knots = list(Mnth = seq(1, 12, len = 6)),
sample_prior = "only",
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
seed = 123,
control = list(adapt_delta = 0.99, max_treedepth = 20),
refresh = 0
)Compiling Stan program...
Start sampling
Warning: There were 24 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
wq.brm6 |>
conditional_effects(effects = "Dt.num:Mnth") |>
plot(points = TRUE) |>
_[[1]] +
scale_y_log10()Sample prior and posterior
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
[1] "b_Intercept" "bs_sDt.num_1" "sds_sDt.num_1"
[4] "sds_sMnth_1" "shape" "Intercept"
[7] "s_sDt.num_1[1]" "s_sDt.num_1[2]" "s_sDt.num_1[3]"
[10] "s_sDt.num_1[4]" "s_sDt.num_1[5]" "s_sDt.num_1[6]"
[13] "s_sDt.num_1[7]" "s_sDt.num_1[8]" "s_sMnth_1[1]"
[16] "s_sMnth_1[2]" "s_sMnth_1[3]" "s_sMnth_1[4]"
[19] "prior_Intercept" "prior_bs" "prior_sds_sDt.num"
[22] "prior_sds_sMnth" "prior_shape" "lprior"
[25] "lp__" "accept_stat__" "stepsize__"
[28] "treedepth__" "n_leapfrog__" "divergent__"
[31] "energy__"
Error in model.matrix.default(f, dat): model frame and formula mismatch in model.matrix()
12 Mixed effects models (all reefs)
wq.sub <- wq |>
mutate(Dt.num = lubridate::decimal_date(Date)) |>
group_by(reef.alias) |>
mutate(Min = min(Dt.num)) |>
ungroup() |>
filter(Min < 2012, Region != "Fitzroy", reef.alias != "Daydream") |>
droplevels()
## reef=wq |>
## group_by(reef.alias) |>
## dplyr:::summarise(Min=min(Dt.num)) |>
## filter(Min<2012) |>
## pull(reef.alias)
## reef
## wq2=wq |> filter(reef.alias %in% reef) |> droplevels
ggplot(wq.sub, aes(y = DOC, x = Dt.num)) +
geom_point() +
facet_wrap(~reef.alias, scales = "free_y") +
geom_smooth() +
scale_y_log10()`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 3 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_point()`).
wq.form <- bf(DOC ~ s(Dt.num) + (1 | reef.alias),
family = Gamma(link = "log")
)
get_prior(wq.form, data = wq.sub)Warning: Rows containing NAs were excluded from the model.
prior class coef group resp dpar nlpar lb ub
(flat) b
(flat) b sDt.num_1
student_t(3, 6.8, 2.5) Intercept
student_t(3, 0, 2.5) sd 0
student_t(3, 0, 2.5) sd reef.alias 0
student_t(3, 0, 2.5) sd Intercept reef.alias 0
student_t(3, 0, 2.5) sds 0
student_t(3, 0, 2.5) sds s(Dt.num) 0
gamma(0.01, 0.01) shape 0
source
default
(vectorized)
default
default
(vectorized)
(vectorized)
default
(vectorized)
default
priors <- prior(normal(7, 0.2), class = "Intercept") +
prior(normal(0, 2), class = "b") +
prior(gamma(0.01, 0.01), class = "shape") +
prior(student_t(3, 0, 2), class = "sds") +
prior(student_t(3, 0, 0.2), class = "sd")
wq.brm8 <- brm(wq.form,
data = wq.sub,
prior = priors,
sample_prior = "only",
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
seed = 123,
control = list(adapt_delta = 0.99, max_treedepth = 20),
refresh = 0
)Warning: Rows containing NAs were excluded from the model.
Compiling Stan program...
Start sampling
Warning: There were 3 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
wq.brm8 |>
conditional_effects(effects = "Dt.num") |>
plot(points = TRUE) |>
_[[1]] +
scale_y_log10()Sample prior and posterior
The following takes about x seconds per chain
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
[1] "b_Intercept"
[2] "bs_sDt.num_1"
[3] "sd_reef.alias__Intercept"
[4] "sds_sDt.num_1"
[5] "shape"
[6] "Intercept"
[7] "r_reef.alias[Cape.Tribulation,Intercept]"
[8] "r_reef.alias[Double.Cone,Intercept]"
[9] "r_reef.alias[Double.Island,Intercept]"
[10] "r_reef.alias[Dunk.North,Intercept]"
[11] "r_reef.alias[Fairlead.Buoy,Intercept]"
[12] "r_reef.alias[Fitzroy.West,Intercept]"
[13] "r_reef.alias[Franklands.West,Intercept]"
[14] "r_reef.alias[Green.Island,Intercept]"
[15] "r_reef.alias[High.West,Intercept]"
[16] "r_reef.alias[Magnetic,Intercept]"
[17] "r_reef.alias[Palms.West,Intercept]"
[18] "r_reef.alias[Pandora,Intercept]"
[19] "r_reef.alias[Pine,Intercept]"
[20] "r_reef.alias[Port.Douglas,Intercept]"
[21] "r_reef.alias[Yorkey's.Knob,Intercept]"
[22] "s_sDt.num_1[1]"
[23] "s_sDt.num_1[2]"
[24] "s_sDt.num_1[3]"
[25] "s_sDt.num_1[4]"
[26] "s_sDt.num_1[5]"
[27] "s_sDt.num_1[6]"
[28] "s_sDt.num_1[7]"
[29] "s_sDt.num_1[8]"
[30] "prior_Intercept"
[31] "prior_bs"
[32] "prior_sds_sDt.num"
[33] "prior_shape"
[34] "prior_sd_reef.alias"
[35] "lprior"
[36] "lp__"
[37] "accept_stat__"
[38] "stepsize__"
[39] "treedepth__"
[40] "n_leapfrog__"
[41] "divergent__"
[42] "energy__"
Error in model.matrix.default(f, dat): model frame and formula mismatch in model.matrix()
wq.resids <- make_brms_dharma_res(wq.brm9, integerResponse = FALSE)
wrap_elements(~ testUniformity(wq.resids)) +
wrap_elements(~ plotResiduals(wq.resids, quantreg = TRUE)) +
wrap_elements(~ testDispersion(wq.resids)) +
plot_layout(nrow = 2)12.1 Model 2
wq.form <- bf(DOC ~ s(Dt.num) + s(Mnth, bs = "cc", k = 6) + (1 | reef.alias),
family = Gamma(link = "log")
)
get_prior(wq.form, data = wq.sub)Warning: Rows containing NAs were excluded from the model.
prior class coef group resp
(flat) b
(flat) b sDt.num_1
student_t(3, 6.8, 2.5) Intercept
student_t(3, 0, 2.5) sd
student_t(3, 0, 2.5) sd reef.alias
student_t(3, 0, 2.5) sd Intercept reef.alias
student_t(3, 0, 2.5) sds
student_t(3, 0, 2.5) sds s(Dt.num)
student_t(3, 0, 2.5) sds s(Mnth, bs = "cc", k = 6)
gamma(0.01, 0.01) shape
dpar nlpar lb ub source
default
(vectorized)
default
0 default
0 (vectorized)
0 (vectorized)
0 default
0 (vectorized)
0 (vectorized)
0 default
priors <- prior(normal(7, 0.15), class = "Intercept") +
prior(normal(0, 2), class = "b") +
prior(gamma(0.01, 0.01), class = "shape") +
prior(student_t(3, 0, 10), class = "sds") +
prior(student_t(3, 0, 0.15), class = "sd")
wq.brm10 <- brm(wq.form,
data = wq.sub,
knots = list(Mnth = seq(1, 12, len = 6)),
prior = priors,
sample_prior = "only",
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
seed = 123,
control = list(adapt_delta = 0.99, max_treedepth = 20),
refresh = 0
)Warning: Rows containing NAs were excluded from the model.
Compiling Stan program...
Start sampling
Warning: There were 26 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
Warning: Examine the pairs() plot to diagnose sampling problems
wq.brm10 |>
conditional_effects(effects = "Dt.num") |>
plot(points = TRUE) |>
_[[1]] +
scale_y_log10()Sample prior and posterior
The following takes about x seconds per chain
The desired updates require recompiling the model
Compiling Stan program...
Start sampling
[1] "b_Intercept"
[2] "bs_sDt.num_1"
[3] "sd_reef.alias__Intercept"
[4] "sds_sDt.num_1"
[5] "sds_sMnth_1"
[6] "shape"
[7] "Intercept"
[8] "r_reef.alias[Cape.Tribulation,Intercept]"
[9] "r_reef.alias[Double.Cone,Intercept]"
[10] "r_reef.alias[Double.Island,Intercept]"
[11] "r_reef.alias[Dunk.North,Intercept]"
[12] "r_reef.alias[Fairlead.Buoy,Intercept]"
[13] "r_reef.alias[Fitzroy.West,Intercept]"
[14] "r_reef.alias[Franklands.West,Intercept]"
[15] "r_reef.alias[Green.Island,Intercept]"
[16] "r_reef.alias[High.West,Intercept]"
[17] "r_reef.alias[Magnetic,Intercept]"
[18] "r_reef.alias[Palms.West,Intercept]"
[19] "r_reef.alias[Pandora,Intercept]"
[20] "r_reef.alias[Pine,Intercept]"
[21] "r_reef.alias[Port.Douglas,Intercept]"
[22] "r_reef.alias[Yorkey's.Knob,Intercept]"
[23] "s_sDt.num_1[1]"
[24] "s_sDt.num_1[2]"
[25] "s_sDt.num_1[3]"
[26] "s_sDt.num_1[4]"
[27] "s_sDt.num_1[5]"
[28] "s_sDt.num_1[6]"
[29] "s_sDt.num_1[7]"
[30] "s_sDt.num_1[8]"
[31] "s_sMnth_1[1]"
[32] "s_sMnth_1[2]"
[33] "s_sMnth_1[3]"
[34] "s_sMnth_1[4]"
[35] "prior_Intercept"
[36] "prior_bs"
[37] "prior_sds_sDt.num"
[38] "prior_sds_sMnth"
[39] "prior_shape"
[40] "prior_sd_reef.alias"
[41] "lprior"
[42] "lp__"
[43] "accept_stat__"
[44] "stepsize__"
[45] "treedepth__"
[46] "n_leapfrog__"
[47] "divergent__"
[48] "energy__"
Error in model.matrix.default(f, dat): model frame and formula mismatch in model.matrix()
13 Find peak
newdata <- with(wq.sub, data.frame(
Dt.num = seq(min(2015), max(2020), length = 1000),
reef.alias = NA, Mnth = 5
))
wq.peak <-
add_epred_draws(
object = wq.brm11, newdata = newdata,
re_formula = NA,
ndraws = 1000
) |>
ungroup() |>
group_by(.draw) |>
# summarise(x = x[which.max(.epred)]) |>
mutate(diff = .epred - lag(.epred)) |>
summarise(Dt.num = Dt.num[which.min(abs(diff))]) |>
median_hdci(Dt.num, .width = 0.95)
wq.peak# A tibble: 1 × 6
Dt.num .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2019. 2017. 2020. 0.95 median hdci
## ## lets plot this
## data_gam.preds <-
## data_gam.brm11 |>
## add_epred_draws(newdata = newdata, object = _) |>
## ungroup() |>
## dplyr::select(-.row, -.chain, -.iteration) |>
## group_by(x) |>
## summarise_draws(median, HDInterval::hdi) |>
## ungroup() |>
## mutate(Flag = between(x, data_gam.peak$.lower, data_gam.peak$.upper),
## Grp = data.table::rleid(Flag)
## )
## data_gam.preds |> head()
## ggplot(data_gam.preds, aes(y = median, x = x)) +
## geom_line(aes(colour = Flag, group = Grp))+
## geom_ribbon(aes(ymin = lower, ymax = upper, fill = Flag, group = Grp), alpha = 0.2)wq.form <- bf(DOC ~ s(Dt.num, by = Region) + s(Mnth, bs = "cc", k = 6, by = Region) + (1 | reef.alias),
family = Gamma(link = "log")
)
get_prior(wq.form, data = wq.sub)Warning: Rows containing NAs were excluded from the model.
prior class coef
(flat) b
(flat) b sDt.num:RegionBurdekin_1
(flat) b sDt.num:RegionMackayWhitsunday_1
(flat) b sDt.num:RegionWetTropics_1
student_t(3, 6.8, 2.5) Intercept
student_t(3, 0, 2.5) sd
student_t(3, 0, 2.5) sd
student_t(3, 0, 2.5) sd Intercept
student_t(3, 0, 2.5) sds
student_t(3, 0, 2.5) sds s(Dt.num, by = Region)
student_t(3, 0, 2.5) sds s(Mnth, bs = "cc", k = 6, by = Region)
gamma(0.01, 0.01) shape
group resp dpar nlpar lb ub source
default
(vectorized)
(vectorized)
(vectorized)
default
0 default
reef.alias 0 (vectorized)
reef.alias 0 (vectorized)
0 default
0 (vectorized)
0 (vectorized)
0 default
priors <- prior(normal(7, 0.15), class = "Intercept") +
prior(normal(0, 2), class = "b") +
prior(gamma(0.01, 0.01), class = "shape") +
prior(student_t(3, 0, 10), class = "sds") +
prior(student_t(3, 0, 0.15), class = "sd")
wq.brm14 <- brm(wq.form,
data = wq.sub,
knots = list(Mnth = seq(1, 12, len = 6)),
prior = priors,
sample_prior = "yes",
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
backend = "rstan",
seed = 123,
control = list(adapt_delta = 0.99, max_treedepth = 20),
refresh = 0
)Warning: Rows containing NAs were excluded from the model.
Compiling Stan program...
Start sampling